Non-parametric speaker turn segmentation of meeting data
نویسندگان
چکیده
An extension of conventional speaker segmentation framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Although each microphone can receive speech from both the participant wearing the microphone (local speech) and other participants (cross-talk), the recorded audio can be broadly classified in three ways: local speech, cross-talk, and silence. This paper proposes a technique which takes into account cross-correlations, values of its maxima, and energy differences as features to identify and segment speaker turns. In particular, we have used classical cross-correlation functions, time smoothing and in part temporal constraints to sharpen and disambiguate timing differences between microphone channels that may be dominated by noise and reverberation. Experimental results show that proposed technique can be successively used for speaker segmentation of data collected from a number of different setups.
منابع مشابه
Varying Microphone Patterns for Meeting Speech Segmentation Using Spatial Audio Cues
Meetings, common to many business environments, generally involve stationary participants. Thus, participant location information can be used to segment meeting speech recordings into each speaker’s ‘turn’. The authors’ previous work proposed the use of spatial audio cues to represent the speaker locations. This paper studies the validity of using spatial audio cues for meeting speech segmentat...
متن کاملMulti-speaker meeting audio segmentation
This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non...
متن کاملThe Nist 2004 Spring Rich Transcription Evaluation: Two-axis Merging Strategy in the Context of Multiple Distant Microphone Based Meeting Speaker Segmentation
This paper presents the ELISA speaker segmentation approach applied on multiple audio channel meeting recordings in the framework of NIST RT’04s meeting (spring) evaluation campaign. As done for BN data speaker segmentation, the ELISA “meeting” system involves two speaker segmentation systems developed individually by the CLIPS and LIA laboratories. The main originality consists in a “two-axis”...
متن کاملAutomatic social role recognition and its application in structuring multiparty interactions
Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as social roles. Most of the research in literature has generally focused on recognition of scenario s...
متن کاملSpeaker adaptation of language models for automatic dialog act segmentation of meetings
Dialog act (DA) segmentation in meeting speech is important for meeting understanding. In this paper, we explore speaker adaptation of hidden event language models (LMs) for DA segmentation using the ICSI Meeting Corpus. Speaker adaptation is performed using a linear combination of the generic speakerindependent LM and an LM trained on only the data from individual speakers. We test the method ...
متن کامل